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METHOD FOR DETERMINATION OF SPATIAL TARGET PROBABILITY 
USING A MODEL OF MULTISENSORY PROCESSING BY THE BRAIN 
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FIELD OF INVENTION 
The present invention generally relates to a method for determining the 
probability that an event has occurred at a set of spatially localized positions in the 
environment, and more particularly to modeling of multisensory processing in brain maps. 



Q 5 BACKGROUND 

y 6 All vertebrate animals constantly monitor the environment by orienting their 

rjj 7 sensory organs toward the locations of events of potential survival value. Neurobiological 

fU 8 evidence indicates that animals utilize multisensory integration to detect the targets of 

9 orienting movements. It further indicates that the ability to integrate multisensory input is 

10 innate, and emerges as the developing brain interacts with the environment. 

1 1 The superior colliculus (SC) is a major site of multisensory integration in 

12 the mammalian brain. The SC, as shown in Fig. 1 , is located in the mammalian midbrain, 

13 and is homologous to the optic tectum of non-mammals. On grounds of differing 

14 connectivity and function, it can be divided into superficial and deep layers. The deep SC 

15 integrates multisensory input and participates in the generation of saccadic (rapid) eye 

16 movements. The superficial SC receives only visual input and does not participate in 
n saccade generation. 

18 The deep SC in mammals receives convergent inputs from the visual, 

19 auditory, and somatosensory systems. Sensory input arrives from many sub-cortical and 
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1 extra-primary cortical regions of the brain. The deep SC sends its outputs to premotor 

2 circuits in the brainstem and spinal cord that control movements of the eyes and other 

3 structures. Neurons in the SC are organized topographically according to their receptive 

4 fields. Maps for the various sensory modalities are in register. The motor output of the 

5 SC is also topographically organized. Activation of neurons in a localized region of the 

6 SC leads, for example, to a saccade of a stereotyped direction and magnitude. 

7 Multisensory enhancement (MSE) is a dramatic form of multisensory 

8 integration, in which the response of an SC neuron to an input of one sensory modality 

9 can be greatly increased by input of another sensory modality. MSE was first identified in 

10 the optic tectum of the rattlesnake, where visual and infrared stimuli can affect the activity 

11 of the same neurons. Percent multisensory enhancement is computed as: 

s 

m 12 %MSE = [(CM-SM max ) / SM max ] X 100 (1) 

ry 13 

fu 

gn 14 where CM is the combined-modality response and SM max is the larger of the two 

I* 15 unimodal responses. Percent MSE can range upwards of 1000%. Percent MSE is larger 

SSB? 

y 16 when the single-modality responses are smaller. This property is known as inverse 

M> 17 effectiveness. 

B 

HI 18 MSE is dependent upon the spatial and temporal relationships of the 

19 interacting stimuli. Stimuli that occur at the same time and place are likely to produce 

20 response enhancement, while stimuli that occur at different times and/or places are not 

21 likely to produce enhancement. MSE is also observed at the behavioral level. For 

22 example, a cat is much more likely to orient toward the source of a weak stimulus if it is 

23 coincident with another stimulus, even a weak one, of a different modality. MSE clearly 

24 helps animals detect targets. It is suggested that the function of MSE is to enhance the 

25 target-related activity of deep SC neurons. 

26 Multiple observations from a variety of sensors increase the amount of 

27 information available for automated tasks such as detection and localization of events in 

28 the environment. Fusing inputs from multiple sensors involves transforming different 
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1 sensor readings into a common representational format, and then combining them in such 

2 a way that the uncertainty associated with the individual sensor observations is reduced. 

3 There are several components to the technological problem of muiltisensor 

4 fusion that have parallels with the neurobiology of the SC as described above. For 

5 example, sensor registration and alignment are issues in a multiple sensor environment. 

6 So is the implementation of a suitable, common representational format. The SC appears 

7 to solve both of these problems through the use of common topographical representations 

8 in the form of sensory maps, which allow multisensory alignment and implementation of 

9 common representational format. 

. io SUMMARY OF THE INVENTION 

9 11 The present invention relates to a method of determining spatial target 

S 

f£j 12 probability using a model of multisensory processing by the brain. The method includes 

SI 

fU 13 acquiring at least two inputs from a location in a desired environment where a first target 

SU 

ff! 14 is detected, and applying the inputs to a plurality of model units in a map corresponding to 

p 15 a plurality of locations in the environment. A posterior probability of the first target at 

q 16 each of the model units is approximated, and a model unit with a highest posterior 

^ 17 probability is found. A location in the environment corresponding to the model unit with 

W 18 a highest posterior probability is chosen as the location of the next target. 

19 DESCRIPTION OF THE DRAWINGS 

20 FIGURE 1 is a diagram showing the superior colliculus of the brain; 

2 1 FIG. 2 is a model representation of the superior colliculus of FIG. 1 in accordance 

22 with an embodiment of the present invention; 

23 FIG. 3 is a graph illustrating functions for approximating Bayes' rule in 

24 accordance with an embodiment of the present invention; 

25 FIG. 4. is a flowchart for illustrating a method for approximating Bayes' 

26 rule in accordance with an embodiment of the present invention; 

27 FIGS. 5 and 6 are flowcharts for illustrating a method for estimating Bayes' 

28 rule in accordance with an embodiment of the present invention; 
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1 FIG. 7 is a diagram illustrating two stages of an unsupervised algorithm for 

2 approximating target probability in accordance with an embodiment of the present 

3 invention; 

4 FIGS. 8-10 are flowcharts for illustrating a method for approximating target 

5 probability using the unsupervised algorithm for approximating target probability; and 

6 FIG. 1 1 is block diagram of a self-aiming camera system incorporating the 

7 present models for determining target probability. 




DETAILED DESCRIPTION OF THE INVENTION 
ing now to FIG. 2, the present invention relates to a model of the 
superior colliculus (§6)^10 of a vertebrate brain 12 (shown in FIG. 1), which integrates 
mulitsensory input and guides>oqenting movements. The model 13, as in the SC 10 of the 
brain 12, are organized as a map 14haA(ing a plurality of grids or units 16. Each unit 16 
on the map 14 represents a collicular neurbntiiat receives multisensory input from its 
corresponding location in the environment. Themodel SC units 16 use sensory inputs 
such as video (V) 18 and audio (A) 20, for example, to^eompute the probability that 
something of interest, i.e., a target 22, has appeared in the surroundings. 

Lodel 13 accordance with one embodiment of the present invention 
HI 18 approximates Bayes^ffclefor computing the probability of a target. Specifically, the SC 
fWnits 16 in the map 14 appropriate P(T|V,A), which is the conditional probability of a 
target (T) given visual (V) and auditory^A) sensory input. The Bayes' rule for computing 
the probability of a target given V and A is asJollows: 



15 



16 



17 




22 



P(T|V,A) = [P(V,A|T) / P(V,A)] P(T) 



(1) 



23 Bayes' rule essentially computes the conditional posterior probability of the target given 

24 sensory input P(T|V,A) by modifying the unconditional prior probability of the target P(T) 

25 on the basis of sensory input V and A. The conditional probability P(V,A|T) is the 

26 likelihood of observing some combination of V and A given the target. The unconditional 

27 probability P(V, A) is the likelihood of observing the same combination of V and A under 
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1 any circumstances. Thus, Bayes' rule computes P(T| V, A) by multiplying P(T) by the ratio 

2 of P(V, A|T) to P( V, A). For example, if in the absence of sensory input the expectation of 

3 a target is 10%, then P(T) equals 0.1. If some input, say V=20 and A=25, is observed, 

4 and if this combination is twice as likely when associated with a target as under general 

5 circumstances, then the ratio of P(V,A|T) to P(V,A) is 2. On the basis of this sensory 

6 input, Bayes' rule states that P(T|V, A) should equal 0.2. Thus, the prior target probability 

7 P(T)=0.1 has been modified by the sensory input to the posterior target probability 

8 P(T| V, A)=0.2. In other words, on the basis of the sensory input received, the chances of a 

9 target are increased from 10% to 20%. 

10 Turning now to FIG. 3, the posterior probability P(T|V,A) computed using 

, . 1 1 Bayes' rule appears generally as an S shaped curve 24 when plotted against A and V. For 

gi2 certain likelihood distribution types (e.g. Poisson, Gaussian), the sigmoid curve 

00 13 [y=l/(l+exp(-x))] can give P(T|V,A) exactly. For other types of unimodal likelihood 

SI 

ftf 14 distributions, or in cases where the likelihood distribution type cannot be specified, the 

ru . . 

01 15 sigmoid can provide a good approximation to the true posterior probability. Even a line 

2 

q 16 26 or a bounded line 28 that comes close over most of the S-curve can provide an 

p 17 adequate approximation to the true posterior probability for certain applications. These 

18 simple functions can be programmed into a computer, for example, and made to 

fU 19 approximate the posterior probability P(T|V,A) when V and A are entered. 

20 Turning now to FIG. 4, the approximation model determines target 

21 probability by first acquiring at least two separate inputs, such as video and audio input, 

22 from the environment (block 30). The inputs are then preprocessed (block 32). 

23 Preprocessing can take various forms depending upon the type of sensory input used. For 

24 the self-aiming camera implementation described below, the video input is preprocessed 

25 to detect moving objects, and the audio is preprocessed to detect sound sources. The 

26 preprocessed sensory inputs are then applied to the SC units 16 (block 34) in the map 14 

27 (best shown in FIG. 2). Inputs are then used to compute the approximation function 

28 selected from one of the functions described above, for example, sigmoid or linear (block 

29 36). Based on this computation, the model SC unit 16 with the highest value is found 
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(block 38). The location in the environment corresponding to this SC unit 16 is then 
chosen as the location of the next target (block 40). 

model 13 in accordance with another embodiment of the present 
invention estimates Bayes' rule for calculating target probability by using back- 
propagation which, asNmown in the art, is a supervised neural network learning algorithm. 
Generally, back-propagation is used to train neural networks having input units, output 
units, and units in between called hidden units. All units are sigmoidal. The input units 
send their activity to the hidden uhks, and the hidden units send their activity to the output 
units. The hidden and output units cahalso receive a bias input, which is an input that has 
activity 1 all the time. All the connections between the input, output, and hidden units 
have weights associated with them. Back-propagation adjusts the values of the weights in 
order to achieve the desired output unit response^for any input pattern. In the estimation 

3 13 method, SC units 16 are the output units of neuraPnetworks that also have input and 

j \. 

j 14 hidden units. The back-propagation algorithm is used to tteratively adjust the weights of 

j . ' \ 

\ 15 the hidden and the output units to achieve the desired output 

I 16 Turning now to FIGS. 5 and 6, the estimation model includes a training 

I 17 phase and an acquisition phase. The training phase as shown in FIG. 5 involves 

! 18 positioning a target at a known but randomly chosen location (block 42), and acquiring 

19 video and audio input from the target and preprocessing it (block 44). The input is 

20 applied to the neural network and the responses of the SC units 16 are found (block 46). 

21 Then desired responses for the SC units 16 (block 48) are generated. The desired 

22 response is 1 if the known target location corresponds to the location of the SC unit 16, 

23 and 0 for the other SC units. Subsequently, the difference between the desired and actual 

24 SC unit responses, i.e. the error, is found (block 50). Thereafter, back-propagation is used 

25 to adjust the network weights to reduce the error (block 52). 

26 Referring to FIG. 6, The acquisition phase includes acquiring raw video and 
27* audio input and preprocessing it (block 54),\and applying the input to the neural network 
ifc^Nmd finding the responses of the SC units ^(block 56). Then, the SC unit with the 

\. highest value is found (block 58). Using this information, the location corresponding to 
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th\sC unit 16 with the highest response value is chosen as the location of the next target 
(bloc^60). \ \ 

3 In accordance with another embodiment of the present invention, an 

4 unsupervised adaptive algorithm is used to determine the target probability. In the 

5 unsupervised algorithm model, "cortical" input is used to influence the multisensory 

6 responses of the SC unit 16 in a way that is consistent with neurobiology. It has been 

7 shown in recent experimental work by others that multisensory enhancement in real SC 

8 neurons of the brain depends not only upon sensory input but also upon input from the 

9 cortex of the brain. Likewise, the present adaptive algorithm incorporates influences 
0 other than direct sensory inputs to approximate target probability. 

lming now to FIG. 7, the present unsupervised algorithm for 
approximating target probability includes two stages. The first stage involves an 
unsupervised learning mechanism that increases the amount of information transmitted 
from the sensory inputs, au&io (A) and video (V), for example, to the SC unit 16 of the 
model SC. This mechanism is known in the art as the Kohonen mechanism, which has 
been shown to increase information\transmission in neural networks. The Kohonen 
mechanism is unsupervised, meaning thaHtwould take the sensory inputs (such as audio 
and video) and automatically adjust the model^C to increase the amount of information 
that is transmitted to it from the input. This is accomplished by adjusting the connection 
weights from the V and A inputs to the SC units 1 6 in sifch a way that individual SC units 
become specialized for specific inputs. For example, the Kohonen algorithm might cause 
one SC unit 16 to become specialized for video input from the^&xtreme left side of the 
environment, and another to become specialized for audio input coming straight ahead. 
For very certain (not noisy) inputs, all the SC units 16 will become specialized for 
particular locations in the environment, and almost all of them will become specific for 
one modality or the other (V or A). The SC units 16 in this case can give a near maximal 
amount of information about the input. These units 16 can indicate not only where th> 
* iaiget is but also of what modalityit is. ■ 

If theViputs are not sA certain (noisy), then\he Kohonen algoVithm will 
^s^ause more of the S(Xn^ts 16 to becdme bimodal and respon\^ to both V and\ These 
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SC units would Be less inf ormatiVk because they could indicate where the target is but not 
of winch modality \is. Thus, the Kohonen algorithm wilrdo the best it can with tfreinput 
it is gi^n to increaseNthe amount of information that is transmitted to the SC units^6 
from the ^( and A input bnits. % 

In the second stage of the present unsupervised algorithm, a separate set of 
cortical units 62, representing cortical neurons, learns in an unsupervised way to modulate 
the strength of the sensory inputs to the model SC units 16. The cortical units 62 can be 
selective for any type of stimulus such as video and audio, or other specialized units such 
as those that are specific for images of automobiles, for example , or other types of targets 
of particular interest. 

Learning at the second stage is based on correlation between the activities 
of the model SC 13 and cortical units 62, and on anri-correlation between cortical units 
and the sensory inputs. The learning strategy at the second stage is based on the idea that 
the model SC units 16 compute target probability. For a multisensory neuron of the brain, 
target probability is much higher if inputs of two separate modalities are active together 
than if only one or the other is active alone. Another input of a completely separate 
modality can greatly increase target probability, even if it is weak. The goal for the 
cortical units 62, then, is to enhance the sensory inputs to model SC units 1 6 of separate 
modalities. 

Gortical units modulate the sensory inputs to the model SC units by 
multiplying their weights. For example, the video input to an SC unit 16 would be 
c v w v V, where c v is the ambunt of cortical modulation of that sensory weight w v . In the 
learning process, an active cortical unit 62 will increase its modulation of a sensory input 
;o an SC unit 16 if the SC unit is also active but the sensory input is inactive. If the SC 
unit 16 and the sensory input are both active then the cortical unit 62 will decrease its 
modulation of the sensory input. For example, whenah SC unit 16 receives multisensory 
video and audio input after stage one training, and a target appears^that provides a video 
input but produces no audio input, that SC unit will be active because^it receives both 
video and audio input and the video input is active. A cortical unit 62 sensitiveto video 
will also be active. Because the activity of the SC unit 16 and the cortical unit^62 are 
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1 correlated, the cortical untt^wiHcharige its level of modulatidrrof the sensory inputs, 

X. ^^^^^ 

2 accordingly^ they are anti-correlatedT^Speoific^lly^he cortical unit 62-wJll decrease its 



3 modulation of thevideo input (because the cortical mutand_the video 

4 but increase its mocratation of the auditory input (because the cortical^nrarf 

5 — input ore anti corrolato 
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ing now to FIG. 8, the preferred embodiment for implementing the 
two-stage algorithm fof-agproximating target probability involves iterative procedures 
that begin after certain parameters^ the model have been set. The structure of the neural 
network model is determined in block 64Sil which the number of SC unit 16 is set, and 
the bias weight and sensitivity of each SC unit are^ass^igned. All SC units 16 in the two- 
stage model are sigmoidal, where output y is related to ni^nit^x by: y=l/(l+exp(-gx)). 
The input x is the weighted sum of its inputs from V and A and from-the bias. The bias 
weight w 5 is the same fixed constant for all SC units 16. The sensitivity g is another fixed 
constant that is the same for all SC units 1 6. These fixed constants (w b and g), along wi 



1 5 -the numberor^SC-units-l^rare-seHn^loek 64— - 



Further, the parameters of iterative learning are set (block 66). Stage-one 
and stage-two learning are both iterative, where small changes to network weights are 
made at each iteration. The learning rate parameters, one for each stage of learning, are 
set to make these adjustments of the appropriate size. The neighborhood size is pertinent 
to stage-one learning. It determines how many SC units 16 adjacent to the winning SC 
unit are also trained (see block 80 below). The numbers of training iterations for each 
stage, both learning rates, and the neighborhood size for stage one are also set in block 66. 

Before stage one and stage two training can begin, the threshold and cutoff 
must be set and the ascending and descending weights must be initialized (block 68). The 
ascending weights are the weights of the connections from the sensory inputs V and A to 
the SC units 16. These weights are initialized to positive, uniformly distributed random 
numbers. The ascending weights are trained during stage one. Any weight that had not 
reached a level greater than the threshold following stage-one training is set to zero. The 
descending weights are the weights from the cortical units 62 that modulate the 
connections from the sensory inputs A and V, as explained above. These weights are 
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1 initialized to zero and are trained during stage two. Stage two training, described below, 

2 depends in part upon correlation between the activity of SC units 16 and cortical units 62. 

3 After stage one training, a model SC unit 1 6 is considered to be activated by sensory input 

4 A and/or V if its response to sensory input exceeds the cutoff. The threshold and cutoff 

5 parameters are set, and the ascending and descending weights are initialized in block 68. 

6 Once the parameters have been set and the weights initialized (blocks 64, 

7 66, and 68) the stage one learning process (described below in more detail) is performed 

8 (block 70). Stage one operates on the ascending weights. Immediately following stage- 

9 one training the ascending weights are thresholded, such that any ascending weight with 
10 value less than the threshold is set to zero (block 72). Then the stage two learning process 

s 1 1 (described below in more detail) is performed (block 74). 

™ 12 TReferring now to FIG. 9, each iteration of the stage one learning process 

pj 13 begins by acquirin^md preprocessing the video and audio inputs from a randomly 
pj(Jlv positioned target (block^). These V and A inputs are sent to the SC units 16 over the 
ascending connections. As explained above, the sigmoidal SC units 16 use the weighted 
sum of these inputs to compute theiHesponses (block 78). Then the SC unit 16 with the 
maximal response is found (block 80). Theunit with the maximal response is referred to 
as the 'winning* SC unit. The ascending weights of the winning SC unit 16 and its 
neighbors are trained using Kohonen's rule (block o^u The neighbors of an SC unit are 
im0 s<fa^ simply the other units that are near it in the network. The rhamber of neighbors trained in 

21 stage one is determined by the neighborhood-size parameter serin block 77 (see FIG. 8). 

22 Kohonen's rule basically adjusts the ascending weights to the winning SC unit and its 

23 neighbors so that they become even more specialized for the current input. 
24 \) Turning to FIG. 10, each iteration of stage-two learning process begins by 

^5""^ acquiring and preprocessing the video and audio inputs from a randomly positioned 
target, and using that input to determine cortical activation (block 84). The term 'cortical' 
is meant to indicate that these units 62 are at a\high level, as they are in the cortex of the 
mammalian brain, and the properties of the cortical units 62 can vary over a very broad 
range. For example, the cortical units can act\s pattern recognizers, and can be 
specialized for particular types of targets like humans or airplanes. So far as applied here, 





the cortical units O^vsimply register the modality of the target, whether it is visual, 
auditory, or both. A visual cortical unit 62, for example, would be active whenever the 
video input is active. Block sHvmdicates that the activity of the cortical units is dependent 
upon the video and audio inputs. The cortical units 62 send descending connections to the 
model SC units 16, and more specifically^ to the connections onto the SC units from the V 
and A sensory inputs. As explained above^an active cortical unit 62 can modulate the 
weights of the ascending connections by multiplying the value of the ascending weight by 
that of the descending weight (block 86). After an^^rtical descending modulation of 
ascending weights is taken input account, the responses or^ie SC units to the ascending 

i nput is computed (block 88). 

Thenthe SC units 16 with responses less than cutoff are found and set to 
zero (block 90). Descen^mgweights of SC units 16 are then trained using the following 
triple correlation rule (block 92)T^\. 

If an SC unit 16 and a corticln*nHqt62are both active, then 
increase the descending weights to 7/fee*4j>£ ascending input 

synapses, and ^**^ 5S * ! ^ 
decrease the descending weights to active ascending mt>u^ 
- synapsoGi * 

Turning now to FIG. 11, the above-described embodiments of the present 
invention for computing the probability of a target is preferably implemented in a self- 
aiming camera (SAC) system 94 that automatically aims a video camera 96 and a 
microphone 98 at the most probable target 100 in the environment. The camera 96 and 
the microphone 98 are mounted on a movable platform 102, which is controlled by a 
motion controller 104. In the preferred embodiment, the models described above for 
approximating or estimating Bayes' rule is implemented on a host PC 106. It should be 
understood, however, that the two-stage adaptive algorithm for approximating target 
probability may also be implemented in the host PC 106. 

The host PC 106 receives audio (A) signals from a separate microphone 
array 108, and video (V) signals from a stationary camera 1 10, preferably a wide-angle 




type. The stationary camera 1 10 and the microphone array 108 are locked on axis with 
the rotatable camera 96 and the directional microphone 98. The audio and the video 
signals from the microphone array 108 and video signals from the stationary camera 110 
are first digitized by ADCs 1 12 and 1 14, respectively, and sent to the PC 106 to be used in 
approximating or estimating target probability. 

The digitized audio signals from the microphone array 108 consist of 
approximately 0.25 seconds of data from each of the microphones. The two signals are 
correlated by the PC 106 to localize the direction to a source measured by relative time of 
arrival. The correlation is preferably performed using a standard, FFT-based correlation 
algorithm. Improved performance is achieved by correlating the signals in the left and 
right microphones 108 that immediately follow abrupt onsets in both audio signals. The 
onset-directed technique is known in the art. The computed correlation is low-pass 
filtered, and the time offset corresponding to the maximal, smoothed correlation is chosen 
to determine the direction to the sound source. In the SAC system 94, the environment is 
represented as a one-dimensional array of 60 elements, for example. The PC 106 
produces a space-map vector of this dimensionality. 

The video input signals from the stationary camera 110 consist, for 
example, of a 640x480 wide-angle monochrome image, updated at a rate of 40 frames per 
second (fps). The video frames are inherently spatially mapped (2-dimensional). Deep 
SC neurons in the brain (see FIG. 1) respond preferentially to moving or time varying 
visual inputs. This is simulated in the PC 106 using a motion detection algorithm, which 
takes as input two images that are separated by one capture time. For each pixel, both the 
spatial and temporal intensity gradients are calculated. These are combined using the 
image brightness constancy equation to determine the normal component of optical flow 
at each pixel. Optical flow is used as an estimate of motion. The pixel containing the 
maximal, smoothed optic flow value is chosen to determine the location of the moving 
visual input source. The resolution of the output of the video motion algorithm is reduced 
to 60, for example. 

A model of the superior colliculus (SC) is implemented in the PC 106 as an 
array of 60 SC units 16, each representing a deep SC neuron. Each unit 16 receives, 
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1 directly or indirectly, one input from each of the 60 preprocessed audio and video signals 

2 in spatial register. Initially, each SC unit 1 6 has a non-overlapping receptive field of one 

3 pixel, and approximates or estimates Bayes' rule to determine the probability that a target 

4 has appeared in its receptive field. 

5 -TbeSelection process is implemented by choosing the SC unit 16 that has 

6 ^jjhe largest response tolfcTfnpou^ SC units 16 are in spatial register with their 
AyO inputs, localization of the target is deterfrtifle4by the location of the chosen SC unit in the 

lace by moving the rotating 

platform 102 to the coordinate in the environment correspondingto^theo^iosen SC unit, 

thereby allowing the target 100 to be viewed by the operator through a monitor 

If target probability is obtained by estimating Bayes* rule using back- 

O 12 propagation, an arra^of computer-controlled buzzer/flasher pairs (not shown), spaced 

83 13 every 1 5 degrees, for example, is used to provide the sensory stimuli for back-propagation 
M * \ 

fy 14/ ^training. At each training cycle, one location is chosen at random, and the buzzer and the 

pj v J \ 

g\ i5V-flasher at that location are activated. The 60 preprocessed audio and video signals are 
q 16\J temporally summed or averaged over a window of 1 second and applied as input to the 
model. The inputs are applied to the model SC uhijsdirectly, or indirectly through a 

18 \ network of hidden units. The location of the source is sp&stfied as a 60 element desired 
output vector of 59 zeros, and a one at the location in the vectb^corresponding to the 
location of the source. The weights are all trained with one cycle oioa^k-propagation, 
and the process is repeated with a source at a newly chosen, random locatior 

r training, the inputs are preprocessed as described above over the 1- 
second window and theVaoplied in spatial register to the SC model. Each SC unit 16 
then estimates, on the basis of its x^eQ^nd^audio inputs, the Bayesian probability that the 
v^K^source is present at its corresponding locatiorrin^he environment, which simulates MSE. 
2§ The location of the model SC unit with the largest response is then chosen as the location 
217 of the most probable target, and the camera 96 and the micropholle^^are aimed in that 
direction. The SAC system 94 chooses as targets those objects in the environment that 

19 move and make noise, which covers most of the targets actually chosen by theism 
30 guiding saccadic eye moveme nt s m animals. — 
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1 From the foregoing description, it should be understood that methods for 

2 modeling the superior colliculus of the brain have been shown and described which have 

3 many desirable attributes and advantages. These models in accordance with the present 

4 invention approximate or estimate B ayes' rule to determine the target probability in the 

5 environment. 

6 While various embodiments of the present invention have been shown and 

7 described, it should be understood that other modifications, substitutions and alternatives 

8 are apparent to one of ordinary skill in the art. Such modifications, substitutions and 

9 alternatives can be made without departing from the spirit and scope of the invention, 

10 which should be determined from the appended claims. 

1 1 Various features of the invention are set forth in the appended claims. 
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